Skip to content

Conversation

@bsatoriu
Copy link
Contributor

@bsatoriu bsatoriu commented Dec 2, 2025

Synchronize the latest maap staging.values.yaml updates to prod.values.yaml.

@github-actions
Copy link

github-actions bot commented Dec 2, 2025

Merging this PR will trigger the following deployment actions.

Support deployments

Cloud Provider Cluster Name Reason for Redeploy
aws victor Support helm chart has been modified
aws openscapeshub Support helm chart has been modified
aws maap Support helm chart has been modified
gcp 2i2c-uk Support helm chart has been modified
aws earthscope Support helm chart has been modified
aws projectpythia Support helm chart has been modified
aws jupyter-health Support helm chart has been modified
gcp catalystproject-latam Support helm chart has been modified
kubeconfig projectpythia-binder Support helm chart has been modified
gcp 2i2c Support helm chart has been modified
aws disasters Support helm chart has been modified
gcp dubois Support helm chart has been modified
gcp leap Support helm chart has been modified
aws oceanhackweek Support helm chart has been modified
aws aimatx-2i2c-hub Support helm chart has been modified
aws reflective Support helm chart has been modified
aws opensci Support helm chart has been modified
gcp awi-ciroh Support helm chart has been modified
aws 2i2c-aws-us Support helm chart has been modified
aws nasa-ghg-hub Support helm chart has been modified
kubeconfig 2i2c-jetstream2 Support helm chart has been modified
aws nmfs-openscapes Support helm chart has been modified
gcp cloudbank Support helm chart has been modified
kubeconfig utoronto Support helm chart has been modified
aws catalystproject-africa Support helm chart has been modified
aws nasa-veda Support helm chart has been modified
gcp hhmi Support helm chart has been modified
aws ubc-eoas Support helm chart has been modified
aws temple Support helm chart has been modified
aws smithsonian Support helm chart has been modified
aws bnext-bio Support helm chart has been modified
aws berkeley-geojupyter Support helm chart has been modified
aws nasa-cryo Support helm chart has been modified
aws strudel Support helm chart has been modified

Staging deployments

Cloud Provider Cluster Name Hub Name Reason for Redeploy
aws maap staging Following helm chart values files were modified: staging.values.yaml, common.values.yaml
aws earthscope staging Following helm chart values files were modified: common.values.yaml
gcp 2i2c-uk staging Following prod hubs require redeploy: lis

Production deployments

Cloud Provider Cluster Name Hub Name Reason for Redeploy
aws maap prod Following helm chart values files were modified: prod.values.yaml, common.values.yaml
gcp 2i2c-uk lis Following helm chart values files were modified: lis.values.yaml
aws earthscope prod Following helm chart values files were modified: common.values.yaml
aws earthscope binder Following helm chart values files were modified: common.values.yaml

@grallewellyn
Copy link
Contributor

@bsatoriu the images should be pointing to OPS not DIT for prod and we want to add QGIS back in
If you give me write access to your fork I can make the updates, saves me from having to make my own PR today

@bsatoriu
Copy link
Contributor Author

bsatoriu commented Dec 2, 2025

@bsatoriu the images should be pointing to OPS not DIT for prod and we want to add QGIS back in If you give me write access to your fork I can make the updates, saves me from having to make my own PR today

I added you @grallewellyn

@grallewellyn
Copy link
Contributor

Thanks, Brian! I made the necessary updates

Copy link
Member

@yuvipanda yuvipanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To prevent drift between staging and prod, we want to keep config in common.yaml as much as possible. So the workflow can be:

  1. Test thing in staging via staging specific config
  2. When they are ready, move them to common.yaml so that's the behavior of both staging and prod

This way we minimize the config that's prod specific, and can use staging to validate issues.

Can you move most of the config to common.yaml than prod.yaml and i'll merge?

@bsatoriu
Copy link
Contributor Author

bsatoriu commented Dec 3, 2025

To prevent drift between staging and prod, we want to keep config in common.yaml as much as possible. So the workflow can be:

  1. Test thing in staging via staging specific config
  2. When they are ready, move them to common.yaml so that's the behavior of both staging and prod

This way we minimize the config that's prod specific, and can use staging to validate issues.

Can you move most of the config to common.yaml than prod.yaml and i'll merge?

This is complete.

@yuvipanda
Copy link
Member

@bsatoriu i see the changes in prod.yaml still

@yuvipanda
Copy link
Member

yuvipanda commented Dec 3, 2025

Cross posting from Slack, after @bsatoriu nudged me to point out that the changes to prod are actually required.


The primary difference between staging and prod after your latest changes is:

  1. env vars set to differentiate between staging and prod
  2. images used themselves (prod has tags, staging has 'develop')

In our experience, (2) is often used to 'test' new images before they get rolled out. This is often better achieved by having people type in the image tag into the 'unlisted image' option in prod, and keep staging and prod have the exact same images instead. This lets people test new images in prod without affecting others, and makes sure staging and prod are as close a match as possible, rather than using staging as almost a 'development' instance. In the future for example, if you're experimenting with s3fuse on staging, making sure the images are the same with prod cuts down on a lot of potential issues when ramping up.

So my suggestion is:

  1. The env vars should be put in a place that's common to all profiles - singleuser.extraEnv. MAAP_API_HOST seems same for both staging and prod, so can go in common.yaml. WORKSPACE_BUCKET seems different so it can go in the respective staging or prod yaml files (since singleuser.extraEnv is a dict it'll be merged). DOCKERIMAGE_PATH_DEFAULT and DOCKERIMAGE_PATH_BASE_IMAGE seem to be the name of the images used, which is also available as $(JUPYTER_IMAGE) (that's kubernetes syntax - see https://kubernetes.io/docs/tasks/inject-data-application/define-interdependent-environment-variables/) - so you can set those in common.yaml as well. This also allows people to experiment with images using unlisted choice without needing changes here.
  2. Keep all the images as tags, and put them in common.yaml. For testing new images, use unlisted choice.

extraEnv:
SCRATCH_BUCKET: s3://maap-scratch-prod/$(JUPYTERHUB_USER)
MAAP_API_HOST: api.maap-project.org
DOCKERIMAGE_PATH_DEFAULT: mas.maap-project.org/root/maap-workspaces/custom_images/maap_base:v5.0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will set this as the environment variable no matter what image is used. Is that what was expected? In the last PR, I saw this was set to be the same as the name of the image, in which case it should use $(JUPYTER_IMAGE) as the value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, extracted out DOCKERIMAGE_PATH_BASE_IMAGE!

WORKSPACE_BUCKET: maap-ops-workspace
nodeSelector:
2i2c/hub-name: prod
profileList:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, we would like to keep profileLists in common.yaml, and use the same image in staging and prod. The staging here is primarily for testing infrastructure changes, and we (2i2c) would like to generally keep it the exact same as prod. So that if we have tested something on staging, we're 99% confident it would work in prod.

having different images in staging and prod could cause problems here, in case the images being different causes failure when migrating. It could also cause the other parts of profile Lists (such as resource config) to drift out of sync between these two.

However, we also recognize that you want to probably test out different images as you're onboarding an existing userbase to this hub, and want to be flexible.

So I see two paths forward:

  1. Use the same image tags for staging and prod, and put it in common.yaml. Image testing happens purely via unlisted choice. This is the preferred way, and also where we should go long term.
  2. If (1) doesn't fit with your existing workflows for building images, leave a block comment above the profileList config in staging and prod, documenting that it's duplicated, and that whoever is modifying it should take care to make sure that the only differences between these two should be the image tags, and everything else should be kept in sync manually. We can then revisit this in 3-6 months, after the initial migration is completed and the pace of image changes has changed.

I wanna unblock y'all asap, so while I have a preference for (1) happy to do either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry jumping into this conversation as I come back from leave.

Let me know if I am phrasing this correctly -
You are saying that staging and prod are meant for infrastructure testing and everything else remains the same. In that case, we (MAAP) as tenants of this infrastructure should be deploying 3 versions of your prod configuration for our own customers and venues (DIT, UAT and OPS). The tenant should not need to worry about your changes in your staging environment.
We should be able to deploy multiple 2i2c prod environments with different MAAP configurations for our testing.

Does that make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On MAAP, the DIT, UAT and OPS venues come with their associated deployments of the API and data processing clusters which impact the jupyter extensions used in the images. So in terms of testing, we are not just testing the images, but also entire the deployment venue which is isolated in its own cloud env.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a block comment above profileList and we would like to go with option 2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @grallewellyn! I've retitled the PR slightly and merged this!

@sujen1412 I opened #7233 to split off the other conversation so we don't lose track of it!

@yuvipanda yuvipanda changed the title Update maap prod.values.yaml Promote MAAP staging hubs to prod Dec 4, 2025
@yuvipanda yuvipanda self-requested a review December 4, 2025 01:16
@yuvipanda yuvipanda merged commit 09a9bd7 into 2i2c-org:main Dec 4, 2025
44 checks passed
@github-actions
Copy link

github-actions bot commented Dec 4, 2025

🎉🎉🎉🎉

Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/runs/19914343488

@yuvipanda
Copy link
Member

The deployment is failing because the image mas.dit.maap-project.org/root/maap-workspaces/2i2c/pangeo:develop doesn't exist. Since it fails in staging, prod deployment will not occur.

This is another reason to keep prod and staging images the same - since staging is supposed to catch issues that affect prod. Here staging has caught an issue, and we no longer know if it'll affect prod or not (and vice versa - staging may succeed but prod may fail).

So in the long run, each production environment should have its own staging where the images are the same.

@grallewellyn
Copy link
Contributor

Since the only difference between staging and production is the image tags, the only way the pipeline could fail is if the image doesn't exist. If the image doesn't exist, we can quickly push it and rerun the deployment
Once we get the hang of the 2i2c deployment process, this won't be an issue again

If there is something wrong with the images, then it won't launch in 2i2c but that is a different issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants